基于超声与钼靶报告及影像的大模型诊断性能评估

doi:10. 20223 / j. cnki. 1000-8535. 2026. 01. 010

广州医药 ›› 2026, Vol. 57 ›› Issue (1): 70-76.DOI: 10. 20223 / j. cnki. 1000-8535. 2026. 01. 010

基于超声与钼靶报告及影像的大模型诊断性能评估

吕佳忆^1,2, 佟文娟³, 林欣欣³, 林雅丹², 王伟³, 郭媛⁴, 杨红^1,2

1 广西医科大学再生医学与医用生物资源开发应用省部共建协同创新中心（广西南宁 530021）;
2 广西医科大学第一附属医院超声医学科（广西南宁 530021）;
3 中山大学附属第一医院超声医学科（广东广州 510000）;
4 华南理工大学附属第二医院（广州市第一人民医院）放射科（广东广州 510180）

收稿日期:2025-04-13 发布日期:2026-02-03

Evaluation of large language models' diagnostic performance based on ultrasound and mammography reports and images

LYU Jiayi^1,2, TONG Wenjuan³, LIN Xinxin³, LIN Yadan², WANG Wei³, GUO Yuan⁴, YANG Hong^1,2

1 Collaborative Innovation Centre of Regenerative Medicine and Medical BioResource Development and Application Co-constructed by the Province and Ministry,Guangxi Medical University,Nanning 530021,China;
2 Department of Medical Ultrasound,the First Affiliated Hospital of Guangxi Medical University,Nanning 530021,China;
3 Department of Medical Ultrasonics,Ultrasomics Artificial Intelligence X-Lab,Institute of Diagnostic and Interventional Ultrasound,The First Affiliated Hospital of Sun Yat-Sen University,Guangzhou 510000,China;
4 Department of Radiology,Guangzhou First People' s Hospital,the Second Affiliated Hospital,School of Medicine,South China University of Technology,Guangzhou 510180,China

Received:2025-04-13 Published:2026-02-03

摘要/Abstract

摘要： 目的评估ChatGPT 4与Llama 3微调模型在乳腺癌诊断中的应用效果,特别是在超声、钼靶及超声联合钼靶的非结构化报告和影像诊断方面。方法回顾性收集了689例同时接受乳腺超声和钼靶检查的患者数据,比较两种模型在文本和图像模态下的诊断性能,并探讨乳腺密度对模型表现的影响。结果在文本模态下,微调Llama 3表现优异,联合诊断准确率达91.7%,优于ChatGPT 4的71.7%。图像模态中两模型准确率均低于70%,但ChatGPT 4灵敏度较高（78.3%）,Llama 3特异度突出（98.3%）。分组分析表明,在非致密型乳腺中钼靶表现更佳,而致密型乳腺中超声诊断更具优势。结论大语言模型在医学图像处理和多模态整合方面仍需进一步优化,医学领域微调的大语言模型在处理非结构化临床文本方面具有潜力。

关键词: 大语言模型, 乳腺癌, 超声, 钼靶

Abstract: Objective To evaluate the application effectiveness of ChatGPT 4 and the fine-tuned Llama 3 model in breast cancer diagnosis,particularly in processing unstructured reports and diagnostic imaging of ultrasound,mammography,and their combined modalities. Methods Retrospective data from 689 patients who underwent both breast ultrasound and mammography examinations were collected.The diagnostic performance of the two models was compared across text and image modalities,and the impact of breast density on model performance was explored. Results In the text modality,the fine-tuned Llama 3 model performed excellently,achieving a combined diagnostic accuracy of 91.7%,outperforming 71.7% of ChatGPT 4.In the image modality,both models had accuracies below 70%,but ChatGPT 4 exhibited higher sensitivity（78.3%）,while Llama 3 demonstrated outstanding specificity（98.3%）.Subgroup analysis indicated that mammography performed better in non-dense breasts,whereas ultrasound was more advantageous in dense breasts. Conclusions The large language models still require further optimization in medical image processing and multimodal integration,but fine-tuned large language models in the medical field show potential in handling unstructured clinical texts.

Key words: large language model, breast cancer, ultrasound, mammography

吕佳忆, 佟文娟, 林欣欣, 林雅丹, 王伟, 郭媛, 杨红. 基于超声与钼靶报告及影像的大模型诊断性能评估[J]. 广州医药, 2026, 57(1): 70-76.

LYU Jiayi, TONG Wenjuan, LIN Xinxin, LIN Yadan, WANG Wei, GUO Yuan, YANG Hong. Evaluation of large language models' diagnostic performance based on ultrasound and mammography reports and images[J]. Guangzhou Medical Journal, 2026, 57(1): 70-76.

参考文献

[1] SHI J,LI J,GAO Y,et al.The screening value of mammography for breast cancer:An overview of 28 systematic reviews with evidence mapping[J].J Cancer Res Clin Oncol,2025,151(3):102.
[2] BEREMAURO S,GIRIO-FRAGKOULAKIS C.Imaging techniques in breast cancer[J].Surgery(Oxford),2024,42(12):875-883.
[3] CHETLEN A,MACK J,CHAN T.Breast cancer screening controversies:Who,when,why,and how[J].Clin Imaging,2016,40(2):279-282.
[4] COLEMAN C.Early detection and screening for breast cancer[J].Semin Oncol Nurs,2017,33(2):141-155.
[5] MARGOLIES L R,PANDEY G,HOROWITZ E R,et al.Breast imaging in the era of big data:Structured reporting and data mining[J].AJR Am J Roentgenol,2016,206(2):259-264.
[6] ANAND A,JUNG S,LEE S.Breast lesion detection for ultrasound images using maskformer[J].Sensors(Basel),2024,24(21):6890.
[7] SUH P S,SHIM W H,SUH C H,et al.Comparing diagnostic accuracy of radiologists versus GPT-4V and gemini pro vision using image inputs from diagnosis please cases[J].Radiology,2024,312(1):e240273.
[8] SHAO M,BASIT A,KARRI R,et al.Survey of different large language model architectures:Trends,benchmarks,and challenges[J].IEEE Access,2024(12):188664-188706.
[9] CARUCCIO L,CIRILLO S,POLESE G,et al.Can ChatGPT provide intelligent diagnoses A comparative study between predictive models and ChatGPT to define a new medical diagnostic bot[J].Expert Syst Appl,2024(235):121186.
[10] TRAN T X M,KIM S,SONG H,et al.Association of longitudinal mammographic breast density changes with subsequent breast cancer risk[J].Radiology,2023,306(2):e220291.
[11] ZHENG Y,ZHANG R,ZHANG J,et al.LlamaFactory:unified efficient fine-tuning of 100+ language models[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(Volume 3:System Demonstrations).Bangkok,Thailand.Stroudsburg,PA,USAACL,2024:400-410.
[12] VEASEY B P,AMINI A A.Low-rank adaptation of pre-trained large vision models for improved lung nodule malignancy classification[J].IEEE Open J Eng Med Biol,2025(6):296-304.
[13] ZHANG J,SUN K,JAGADEESH A,et al.The potential and pitfalls of using a large language model such as ChatGPT,GPT-4,or LLaMA as a clinical assistant[J].J Am Med Inform Assoc,2024,31(9):1884-1891.
[14] HUANG W,ZHENG X,MA X,et al.An empirical study of LLaMA3 quantization:from LLMs to MLLMs[J].Vis Intell,2024,2(1):36.
[15] LIU X,LIU H,YANG G,et al.A generalist medical language model for disease diagnosis assistance[J].Nat Med,2025,31(3):932-942.
[16] 孙磊,汪安安,宋一敏,等 .大语言模型在临床医学领域的应用、挑战和展望[J].解放军医学院学报,2025,46(1):50-60.
[17] YANG X,LI T,SU Q,et al.Application of large language models in disease diagnosis and treatment[J].Chin Med J(Engl),2025,138(2):130-142.
[18] SCHWARTZ I S,LINK K E,DANESHJOU R,et al.Black box warning:Large language models and the future of infectious diseases consultation[J].Clin Infect Dis,2024,78(4):860-866.
[19] ACOSTA J N,FALCONE G J,RAJPURKAR P,et al.Multimodal biomedical AI[J].Nat Med,2022,28(9):1773-1784.
[20] BRIN D,SORIN V,BARASH Y,et al.Assessing GPT-4 multimodal performance in radiological image analysis[J].Eur Radiol,2025,35(4):1959-1965.
[21] HORIUCHI D,TATEKAWA H,OURA T,et al.ChatGPT’s diagnostic performance based on textual vs.visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology[J].Eur Radiol,2025,35(1):506-516.
[22] SCHOUTEN D,NICOLETTI G,DILLE B,et al.Navigating the landscape of multimodal AI in medicine:A scoping review on technical challenges and clinical applications[J].Med Image Anal,2025(105):103621.
[23] WU S H,TONG W J,LI M D,et al.Collaborative enhancement of consistency and accuracy in US diagnosis of thyroid nodules using large language models[J].Radiology,2024,310(3):e232255.
[24] YIN S,FU C,ZHAO S,et al.A survey on multimodal large language models[J].Natl Sci Rev,2024,11(12):nwae403.
[25] 王海霞,宋倩,郑国红,等 .钼靶和超声多普勒结合血清肿瘤标志物诊断早期乳腺癌研究[J].中国医学装备,2024,21(1):82-87.

基于超声与钼靶报告及影像的大模型诊断性能评估

Evaluation of large language models' diagnostic performance based on ultrasound and mammography reports and images

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	江美丽, 王海玉, 潘敏, 樊绮云, 伍颖恒, 廖灿. 孕中晚期口腔二维轴向切面超声诊断唇腭裂类型的价值研究[J]. 广州医药, 2026, 57(1): 83-87.
[2]	李湘力, 蔡敬宙, 陈晓彦, 许婷. 超声引导下针刺蝶腭神经节治疗过敏性鼻炎的随机对照研究[J]. 广州医药, 2026, 57(1): 105-110.
[3]	戴小华, 石胜利, 袁帅, 李锐强. 窄带成像结合放大内镜和超声内镜评估在早期结直肠癌内镜下治疗前的价值[J]. 广州医药, 2025, 56(7): 957-962.
[4]	刘容晨, 赵萌萌, 安迪, 王宁, 闫征. 钝性分离扩皮法与常规扩皮法在乳腺癌术后患者PICC置管中的应用效果[J]. 广州医药, 2025, 56(6): 798-803.
[5]	张千, 王菲, 孟纯雪, 张生富, 赵海龙, 唐旭红, 陈紫晔, 郭斌. 超声引导结合机器学习技术的智能针灸精准诊疗系统研究[J]. 广州医药, 2025, 56(5): 599-604.
[6]	张涛, 王丽霞, 何志杰. 实时图像引导系统对乳腺癌保乳术后放疗摆位误差的影响[J]. 广州医药, 2025, 56(5): 656-661.
[7]	殷雨来, 何晓阳, 夏琳, 张晓宇. 三阴性乳腺癌Cox回归临床预测模型的构建与验证：基于SEER数据库[J]. 广州医药, 2025, 56(4): 457-468.
[8]	李秋兰, 梁双燕, 曹智英, 陈柠, 黄庭婷, 罗秋燕. 宫腔镜病灶切除术与超声监测下吸宫术治疗剖宫产瘢痕部位妊娠的有效性比较[J]. 广州医药, 2025, 56(4): 500-504.
[9]	刘文清, 吴俊, 马文斌. 超声引导下胸膜活检联合胸腔积液检验对结核性胸膜炎的诊断价值[J]. 广州医药, 2025, 56(4): 505-508.
[10]	王军峰, 蒋建召, 金海伟, 李运正, 邓应彪. 便携式超声仪e-FAST技术在批量伤员军地联合院前救治中的应用研究[J]. 广州医药, 2025, 56(4): 537-541.
[11]	胡念, 徐芳, 肖丹丹, 张艳芬, 赖香. D-HyCoSy配合宫腔水造影对不孕症患者IUA诊断灵敏度、特异度分析[J]. 广州医药, 2025, 56(3): 346-349.
[12]	王灏, 袁宏伟, 刘校伸, 杜磊, 王志鑫. 胰源性门静脉高压的临床诊断和治疗研究进展[J]. 广州医药, 2025, 56(2): 247-251.
[13]	高红丽, 于雅. 育龄期乳腺癌患者子代健康担忧与遗传风险认知对生育意愿的影响[J]. 广州医药, 2025, 56(12): 1754-1760.
[14]	程桥珍, 张志伟, 陈园园, 方志强, 张顺喜, 张瑾. 肌骨超声在慢性肌肉骨骼疼痛康复精准化教学中的应用研究[J]. 广州医药, 2025, 56(11): 1605-1610.
[15]	李婕, 梁国强, 王飞, 林泽宇, 陈柏权, 刘雪萍, 孟洋阳. 基于ChatGPT-4o与DeepSeek的虚拟标准化患者系统在医学问诊教学中的比较研究[J]. 广州医药, 2025, 56(10): 1346-1352.